21 research outputs found

    PlaNeRF: SVD Unsupervised 3D Plane Regularization for NeRF Large-Scale Scene Reconstruction

    Full text link
    Neural Radiance Fields (NeRF) enable 3D scene reconstruction from 2D images and camera poses for Novel View Synthesis (NVS). Although NeRF can produce photorealistic results, it often suffers from overfitting to training views, leading to poor geometry reconstruction, especially in low-texture areas. This limitation restricts many important applications which require accurate geometry, such as extrapolated NVS, HD mapping and scene editing. To address this limitation, we propose a new method to improve NeRF's 3D structure using only RGB images and semantic maps. Our approach introduces a novel plane regularization based on Singular Value Decomposition (SVD), that does not rely on any geometric prior. In addition, we leverage the Structural Similarity Index Measure (SSIM) in our loss design to properly initialize the volumetric representation of NeRF. Quantitative and qualitative results show that our method outperforms popular regularization approaches in accurate geometry reconstruction for large-scale outdoor scenes and achieves SoTA rendering quality on the KITTI-360 NVS benchmark.Comment: 14 pages, 7 figure

    CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit Representation

    Full text link
    Beyond novel view synthesis, Neural Radiance Fields are useful for applications that interact with the real world. In this paper, we use them as an implicit map of a given scene and propose a camera relocalization algorithm tailored for this representation. The proposed method enables to compute in real-time the precise position of a device using a single RGB camera, during its navigation. In contrast with previous work, we do not rely on pose regression or photometric alignment but rather use dense local features obtained through volumetric rendering which are specialized on the scene with a self-supervised objective. As a result, our algorithm is more accurate than competitors, able to operate in dynamic outdoor environments with changing lightning conditions and can be readily integrated in any volumetric neural renderer.Comment: Accepted to ICCV 202

    ImPosing: Implicit Pose Encoding for Efficient Visual Localization

    Full text link
    We propose a novel learning-based formulation for visual localization of vehicles that can operate in real-time in city-scale environments. Visual localization algorithms determine the position and orientation from which an image has been captured, using a set of geo-referenced images or a 3D scene representation. Our new localization paradigm, named Implicit Pose Encoding (ImPosing), embeds images and camera poses into a common latent representation with 2 separate neural networks, such that we can compute a similarity score for each image-pose pair. By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but incrementally refined. Very large environments force competitors to store gigabytes of map data, whereas our method is very compact independently of the reference database size. In this paper, we describe how to effectively optimize our learned modules, how to combine them to achieve real-time localization, and demonstrate results on diverse large scale scenarios that significantly outperform prior work in accuracy and computational efficiency.Comment: Accepted at WACV 202

    MOISST: Multimodal Optimization of Implicit Scene for SpatioTemporal calibration

    Full text link
    With the recent advances in autonomous driving and the decreasing cost of LiDARs, the use of multimodal sensor systems is on the rise. However, in order to make use of the information provided by a variety of complimentary sensors, it is necessary to accurately calibrate them. We take advantage of recent advances in computer graphics and implicit volumetric scene representation to tackle the problem of multi-sensor spatial and temporal calibration. Thanks to a new formulation of the Neural Radiance Field (NeRF) optimization, we are able to jointly optimize calibration parameters along with scene representation based on radiometric and geometric measurements. Our method enables accurate and robust calibration from data captured in uncontrolled and unstructured urban environments, making our solution more scalable than existing calibration solutions. We demonstrate the accuracy and robustness of our method in urban scenes typically encountered in autonomous driving scenarios.Comment: Accepted at IROS2023 Project site: https://qherau.github.io/MOISST

    Localisation basée vision à partir de caractéristiques discriminantes issues de données visuelles hétérogènes

    No full text
    Visual-based Localization (VBL) consists in retrieving the location of a visual image within a known space. VBL is involved in several present-day practical applications, such as indoor and outdoor navigation, 3D reconstruction, etc. The main challenge in VBL comes from the fact that the visual input to localize could have been taken at a different time than the reference database. Visual changes may occur on the observed environment during this period of time, especially for outdoor localization. Recent approaches use complementary information in order to address these visually challenging localization scenarios, like geometric information or semantic information. However geometric or semantic information are not always available or can be costly to obtain. In order to get free of any extra modalities used to solve challenging localization scenarios, we propose to use a modality transfer model capable of reproducing the underlying scene geometry from a monocular image. At first, we cast the localization problem as a Content-based Image Retrieval (CBIR) problem and we train a CNN image descriptor with radiometry to dense geometry transfer as side training objective. Once trained, our system can be used on monocular images only to construct an expressive descriptor for localization in challenging conditions. Secondly, we introduce a new relocalization pipeline to improve the localization given by our initial localization step. In a same manner as our global image descriptor, the relocalization is aided by the geometric information learned during an offline stage. The extra geometric information is used to constrain the final pose estimation of the query. Through comprehensive experiments, we demonstrate the effectiveness of our proposals for both indoor and outdoor localization.La localisation basée vision consiste à déterminer l'emplacement d'une requête visuelle par rapport à un espace de référence connu. Le principal défi de la localisation visuelle réside dans le fait que la requête peut avoir été acquise à un moment diffèrent de celui de la base de données. On pourra alors observer des changements visuels entre l'environnement actuel et celui de la base de référence, en particulier lors d'application de localisation en extérieur. Les approches récentes utilisent des informations complémentaires afin de répondre à ces scenarios de localisation visuellement ambigu, comme la géométrie ou la sémantique. Cependant, ces modalités auxiliaires ne sont pas toujours disponibles ou peuvent être couteuse à obtenir. Afin de s'affranchir de l'utilisation modalité supplémentaire pour faire face à ces scénarios de localisation difficiles, nous proposons d'utiliser un modèle de transfert de modalité capable de reproduire la géométrie d'une scène à partir d'une image monoculaire. Dans un premier temps, nous présentons le problème de localisation comme un problème d'indexation d'images et nous entrainons un réseau de neurones convolutif pour la description globale d'image en introduisant le transfert de modalité radiométrie vers géométrie comme objectif secondaire. Une fois entrainé, notre modèle peut être appliqué sur des images monoculaires pour construire un descripteur efficace pour la localisation en conditions difficiles. Dans un second temps, nous introduisons une nouvelle méthode de raffinement de pose pour améliorer la localisation donnée par notre première étape. De la même manière que notre descripteur d'image globale, la relocalisation est facilitée par les informations géométriques apprises lors d'une étape préalable. L'information géométrique supplémentaire est utilisée pour contraindre l'estimation finale de la pose de la requête. Grâce des expériences approfondies, nous démontrons l'efficacité de nos propositions pour la localisation en intérieur et en extérieur

    Collaborative localization and formation flying using distributed stereo-vision

    No full text
    International audienceThis paper considers collaborative stereo-vision as a mean of localization for a fleet of micro-air vehicles (MAV) equipped with monocular cameras, inertial measurement units and sonar sensors. A sensor fusion scheme using an extended Kalman filter is designed to estimate the positions and orienta-tions of all the vehicles from these distributed measurements. The estimation is completed by a formation control to maximize the overlapping fields of view of the vehicles. Experimental tests for the complete perception and control loop have been performed on multiple MAVs with centralized processing on a ROS ground station

    Localisation Basée Vision : de l'hétérogénéité des approches et des données

    No full text
    National audienceDe nos jours, nous disposons d'une grande diversité de données sur les lieux qui nous entourent. Ces données peuvent être de natures très différentes : une collection d'images, un modèle 3D, un nuage de points colorisés, etc. Lorsque les GPS font défaut, ces informations peuvent être très utiles pour localiser un agent dans son environnement s'il peut lui-même acquérir des informations à partir d'un système de vision. On parle alors de Localisation Basée Vision (LBV). De par la grande hétérogénéité des données acquises et connues sur l'environnement, il existe de nombreux travaux traitant de ce problème. Cet article a pour objet de passer en revue les différentes méthodes récentes pour localiser un système de vision à partir d'une connaissance a priori sur l'environnement dans lequel il se trouve

    Apprentissage de modalités auxiliaires pour la localisation basée vision

    No full text
    National audienceIn this paper we present a new training with side modality framework to enhance image-based localization. In order to learn side modality information, we train a fully convo-lutional decoder network that transfers meaningful information from one modality to another. We validate our approach on a challenging urban dataset. Experiments show that our system is able to enhance a purely image-based system by properly learning appearance of a side modality. Compared to state-of-the-art methods, the proposed network is lighter and faster to train, while producing comparable results.Dans cet article nous présentons une nouvelle méthode d'apprentissage à partir de modalités auxiliaires pour améliorer un système de localisation basée vision. Afin de bénéficier des informations de modalités auxiliaires disponibles pendant l'apprentissage, nous entraînons un réseau convolutif à recréer l'apparence de ces modalités annexes. Nous validons notre approche en l'appliquant à un problème de description d'images pour la localisation. Les résultats obtenus montrent que notre système est capable d'améliorer un descripteur d'images en apprenant correctement l'apparence d'une modalité annexe. Comparé à l'état de l'art, le réseau présenté permet d'obtenir des résultats de localisation comparables, tout en étant plus compacte et plus simple à entraîner

    Learning Scene Geometry for Visual Localization in Challenging Conditions

    No full text
    International audienceWe propose a new approach for outdoor large scale image based localization that can deal with challenging scenarios like cross-season, cross-weather, day/night and long-term localization. The key component of our method is a new learned global image descriptor, that can effectively benefit from scene geometry information during training. At test time, our system is capable of inferring the depth map related to the query image and use it to increase localization accuracy. We are able to increase recall@1 performances by 2.15% on cross-weather and long-term localization scenario and by 4.24% points on a challenging winter/summer localization sequence versus state-of-the-art methods. Our method can also use weakly annotated data to localize night images across a reference dataset of daytime images

    Improving Image Description with Auxiliary Modality for Visual Localization in Challenging Conditions

    No full text
    International audienceImage indexing for lifelong localization is a key component for a large panel of applications, including robot navigation, autonomous driving or cultural heritage valorization. The principal difficulty in long-term localization arises from the dynamic changes that affect outdoor environments. In this work, we propose a new approach for outdoor large scale image-based localization that can deal with challenging scenarios like cross-season, cross-weather and day/night localization. The key component of our method is a new learned global image descriptor, that can effectively benefit from scene geometry information during training. At test time, our system is capable of inferring the depth map related to the query image and use it to increase localization accuracy. We show through extensive evaluation that our method can improve localization performances, especially in challenging scenarios when the visual appearance of the scene has changed. Our method is able to leverage both visual and geometric clues from monocular images to create discriminative descriptors for cross-season localization and effective matching of images acquired at different time periods. Our method can also use weakly annotated data to localize night images across a reference dataset of daytime images. Finally we extended our method to reflectance modality and we compare multi-modal descriptors respectively based on geometry, material reflectance and a combination of both
    corecore